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TESTS  FOR  DEPENDENCE1 
by 

Myles  Hollander 
The  Florida  State  University 

Abstract 

This  paper  is  prepared  as  an  invited  entry  for  the  Encyclopedia  of  Statistical 
Sciences,  to  be  edited  by  Samuel  Kotz  and  Norman  L.  Johnson  and  to  be  published  by 
John  Wiley  S Sons.  It  is  designed  to  provide  a sound  introduction  for  a reasonably 
well-informed  reader  who  is,  however,  not  a specialist  in  tests  for  dependence.  The 
paper  contains  references  to  many  tests  but  emphasizes  the  parametric  test  of  inde- 
pendence based  on  Pearson's  sample  correlation  coefficient  r and  certain  nonparametric 
tests  based  on  ranks.  The  ranks  tests  are  generally  preferable  to  the  test  based  on 
r in  that  they  have  wider  applicability,  are  much  less  sensitive  to  outlying  observa- 
tions, are  exact  under  mild  assumptions  which  do  not  require  an  underlying  bivariate 
normal  population,  and  have  good  efficiency  (power)  properties. 

1.  Introduction 

Many  studies  are  designed  to  explore  the  relationship  between  two  random  var- 
iables X and  Y,  say,  and  specifically  to  determine  whether  X and  Y are  independent 
or  dependent.  Some  particular  examples  are: 

(i)  Obesity  and  blood  pressure:  Are  obesity  and  blood  pressure  independent 
or,  for  example,  do  men  who  are  overweight  also  tend  to  have  high  blood  pressure? 

Here  X could  be  the  degree  of  overweight  as  measured  by  the  ratio  of  actual  body 
weight  to  ideal  body  weight  as  given  in  certain  standard  tables,  and  Y could  be 
systolic  blood  pressure. 
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under  Grant  AFOSR-78-3678.  The  United  States  Government  is  authorized  to  reproduce 
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(ii)  dolor  and  taste  of  tuna:  Are  color  and  quality  of  canned  tuna  independent 
or  perhaps  do  consumers  tend  to  prefer  light  tuna?  Here  X could  be  a measure  of  light- 
ness and  Y could  be  a quality  score  determined  by  a consumer  panel. 

(iii)  Infants  'Miking  and  their  IQ:  Is  the  time  until  it  takes  an  infant  to  walk 
alone  independent  of  the  infant's  IQ  at  a later  age,  or  do  children  who  learn  to  walk 
early  tend  to  have  higher  IQs?  Hore  X could  be  the  number  of  days  measured  from  birth 
until  the  infant  walks  alone,  ana  Y could  be  the  infant's  IQ  score  at  age  5. 

(iv)  System  reliability  and  the  environment:  Is  the  life  length  X (sayl  of  a 
specific  system  independent  of  a certain  characteristic  of  the  environment,  for  example, 
the  temperature  Y,  within  which  the  system  operates,  or  do  high  temperatures  tend  to 
shorten  the  life  length? 

One  can  test  the  null  hypothesis  that  the  two  variables  X and  Y are  independent, 

against  alternatives  of  dependence,  using  a random  sample  from  the  underlying  bivariate 

population.  We  suppose  that  such  a sample  of  sice  n is  available,  and  we  denote  the 

sample  by  (X,  ,Y,),  (X-,,Y,) (X  ,Y  ).  Our  assumptions  are 

i i * * n n 

Al.  The  n bivariate  observations  (Xj ,Yj) , . . . , (X  ,Yn)  are  mutually  independent. 

A2.  Each  (X^.Y^)  oomes  from  the  same  bivariate  population  uith  oontinuous  distribution 
function  H(x,y)  ■ P(Xsx.Ysy)  and  continuous  marginal  distributions  F(x)  * P(Xsx)  and 
G(y)«  P(YSv) . 

The  hypothesis  of  independence  asserts  that 

Hq:  H(x,y) *■  F(x)G(y),  for  all  (x,y),  (1) 


that  is,  the  variables  X and  Y are  independent, 
are  mutually  independent,  that  is 

P(XlSxl*  w •••*  W W 


Under  HQ,  all  2n  random  variables 


n 

i*l  p G(yt) . 
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When  we  discuss  alternatives  to  HQ,  we  will  be  assuming  that  X and  Y are  dependent 
so  that  (1)  fails  to  hold,  but  we  still  insist  that  the  independence  between 
the  n pairs  is  preserved. 

The  organization  of  this  paper  is  as  follows.  In  Section  2 we  present  the 
classical  test  of  based  on  Pearson's  correlation  coefficient  r.  This  test  assumes, 
in  addition  to  A1  and  A2,  that  the  underlying  population  is  bivariate  normal.  Section 
3 present  rank  tests  of  which  do  not  require  the  assumption  of  normality.  These 
rank  tests  have  additional  advantages,  relative  to  the  test  based  on  r,  including 
wider  applicability,  relative  insensitivity  to  outlying  observations,  and  desirable 
efficiency  (power)  properties.  Section  4 illustrates  various  tests  using  data 
relating  to  color  and  taste  of  tuna. 

2.  A Test  Based  on  Pearson's  Correlation  Coefficient 

The  Pearson  correlation  coefficient  r,  proposed  by  the  eminent  statistician 
Karl  Pearson  in  1896,  is 


The  statistic  r is  the  sample  correlation  coefficient  and  is  an  estimator  of  the 
corresponding  population  parameter  p,  the  correlation  coefficient  of  the  bivariate 
population  defined  by  H(x,y).  Specifically, 


where  E denotes  expectation,  o^  is  the  standard  deviation  of  the  X population, 

and  <jy  is  the  standard  deviation  of  the  Y population.  It  can  be  shown  that  for  all 
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samples  -1  s r s 1.  arul  for  all  bivariate  populations  -1  s.  o s 1.  When  p > 0,  this 

» 

may  be  interpreted  as  X and  Y being  positively  associated  (as  measured  by  p)  and 
o < 0 may  be  interpreted  as  X and  Y being  negatively  associated  (as  measured  by  ol . 
Aaauming  H(x,yl  ia  a bivariate  normal  cumulative  distribution  function  'jith  correla- 


tion o,  an  exact  a level  test  of  versus  p + 0 is 


reject  Hn  in  favor  of  o i 0 if  |t|  2 t .,  ,, 


a/2, n-2' 


accept  HQ  if  |t|  < t^,,  n_,. 


(41 


where  t^,  n is  the  upper  a/2  percentile  point  of  Student's  t distribution  with 
n-2  degrees  of  freedom,  and 


T-(n-2)*r/(l-r:i  *. 


(51 


Since  | T | is  an  increasing  function  of  |r|,  the  test  defined  by  (41  is  equivalent 
to  the  test  which  rejects  for  large  values  of  |r|,  and  the  latter  is  easily  derived 


to  be  the  likelihood  ratio  test  of  H(1  versus  p + 0 in  the  model  which  assumes  bi- 


variate normality.  (Of  course  under  the  bivariate  normality  assumption,  X and  Y 
are  independent  if  and  only  if  p ■ 0.1 


One-sided  tests  based  on  T are  readily  defined.  To  test  H(1  versus  o > 0,  at 


the  a level,  reject  H.  if  T s t , and  accept  Hn  if  T < t 


To  test 


*0 a, n-2  “*"* ”0 c.n-2  •-  — "0 

versus  p < 0,  at  the  a level,  reject  H.  if  Ts-t  , and  accept  H,  if  T >-T 

v j , n*  » u , n - - 

The  two-sided  test  defined  by  (SI,  and  the  corresponding  one-sided  tests,  are 
exact  (i.e.,  have  true  Type  I error  probability  equal  to  the  nominal  value  value  a! 
only  when  the  underlying  population  is  bivariate  normal.  Approximate  (for  large  n! 
tests  of  Hq  which  do  not  require  the  assumption  of  bivariate  normality  treat  T as 
a standard  normal  random  variable  under  H„. 


I'  il 
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For  more  information  on  testing  independence  in  this  parametric  context,  see 
Bickel  and  Doksum  (1977,  Section  6.S.A).  Devlin,  Gnanadesikan,  and  Kettenring  (1975) 
point  out  that  r is  very  sensitive  to  outliers  and  consider  the  related  problem  of 
robust  estimation  and  outlier  detection  with  correlation  coefficients. 

In  Section  3 we  present  nonparametric  tests  of  which  are  exact  without  re- 
quiring the  assumption  of  bivariate  normality. 

3.  Rank  Tests  of  Independence 

Let  Ri  be  the  rank  of  X^  in  the  joint  ranking  from  least  to  greatest  of  Xj , 
...,Xn  and  let  S^  be  the  rank  of  Y^^  in  the  (separate)  joint  ranking  from  least  to 
greatest  of  yj,...,Y  . 

Under  assumptions  A1  and  A2  and  H~,  the  vector  of  X ranks  R ■ (R, , . . . ,R  ) 

u in 

is  independent  of  the  vector  of  Y ranks  S * (S, ,...,S  ),  and  both  R and  S have  uni- 

i n 

form  distributions  over  the  space  P of  the  n.'  permutations  (ij .....  i ) of  the  integers 
(l,...,n).  That  is,  for  each  permutation  (ij,...,i  ), 

P0((Rlt...,Rn)  * (i1 .... ,in) ) * 1/n! , 

with  the  same  result  holding  for  (Slt...,S  ).  (The  subscript  0 indicates  the  prob- 
ability is  computed  under  Hq.)  It  follows  that  rank  statistics  (i.e.,  statistics 
which  are  solely  based  on  R and  S)  are  distribution- free  under  HQ. 

One  important  class  of  rank  statistics  for  testing  HQ  are  the  linear  rank  statis- 
tics of  the  form 

n 

L * T a(R. )b(S. ) (6) 

i«l 

where  the  "scores"  a(Ri),b(Si)  satisfy  a(l)  s ...  s a(n),  b(l)  s ...  s b(n). 

Teat  baaed  on  Spearman 'a  rank  correlation  coefficient:  Making  the  choice 


a(i)  ■ b(i)  « i in  (6) , L reduces  to 
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M ■ 1*1*1- 
i»l 


(7) 


Then  if  M is  linearily  transformed  so  that  the  minimum  and  maximum  values  are 
-1  and  1,  we  obtain  Spearman's  rank  order  correlation  coefficient 

n 


rs  “ 


12  l [R.-(n*l)/2][S.-(n*l)/2] 

i-1  1 x 

* 

n(n‘-l) 


(31 


An  even  simpler  formula  for  computational  purposes  is 


rs  •l-±* 


r 2 

^ l o: 

i»l 


(91 


n -n 


where  » R^-S„  Note  also  that  rs  is  obtainable  from  r(21  by  replacing 

with  its  X-rank  R.  and  Y.  with  its  Y-rank  S.. 

ii  l 

The  statistic  rs  does  not  estimate  o as  given  in  (3)  but  rather  it  estimates 
the  population  parameter 


os  -bPUX^HYj-Yjl  > 0}  -3. 


(101 


It  can  be  shown  that  for  all  samples  -1  s r £ 1,  and  for  all  bivariate  populations 
-1  £ o £ 1.  Note  that 

pUXj-XjJCYj-Yj)  > o>  • pcx1>x2,y1>y3)*p(x1<x2>y1<y3) 


and  when  HQ  is  true 

P{(X1-X2)(Y1-Y3)>  0)  - P(X1>X:)P(Yl>Y.)^P(Xl<X:)P(Y1<Y3)  - h * % . *, 

so  that  when  HQ  is  true,  ■ 0.  In  addition,  os  > 0 may  be  interpreted  as  X and 
Y being  positively  associated  (as  measured  by  ps) , and  os<0  may  be  interpreted  as 
X and  Y being  negatively  associated  (as  measured  by  os).  (For  further  information 
and  interpretation  of  the  parameter  os  as  a measure  of  association  see  Kruskal  (19581.1 
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Under  assumptions  A1  and  A 2.  an  exact  a level  test  of  versus  os  d 0 is 
reject  HQ  in  favor  of  os  + 0 if  ( r s { a r (a/2,n), 

(ID 


accept  Hq  if  |r$|  < rj(a/2,n), 


where  rs 10/2,01  is  the  upper  a/2  percentile  point  of  the  null  distribution  of 
r^.  To  test  HQ  vs.  the  one-sided  alternative  3 » 0,  at  the  a level,  reject  H0 
if  r}  a r(a,n)  and  accept  HQ  otherwise.  To  test  HQ  vs.  0 < 0,  at  the  a level, 
reject  it'  rs  s-r(a.n)  and  accept  otherwise. 

From  (9)  we  see  that  tests  based  on  rs  are  equivalent  to  tests  based  on  the  sta- 
tistic  ED“.  Glasser  and  Winter  give  critical  values  of  rs  and  2D“  for  n ■ 4(1)50. 
Tables  of  the  complete  null  distribution  of  rs  and  ED7  are  given  for  n ■ 4(1)11  in 
Kraft  and  van  Eeden  (1968). 

Under  H^,  E(rs)  * 0,  Var(rs)  » l/(n-l),  and  as  n gets  large,  the  distribution 
of  (n-l)^rs  tends  to  the  standard  normal  distribution.  Thus  approximate  (for  large 
n)  tests  of  can  be  obtained  by  treating 

r*  • (n-lT»rs  (12) 

as  a standard  normal  variable  under  HQ. 

:j8‘  baaed  on  Kendall  ’a  rank  oorralaxion  oceffioient:  Kendall's  rank  correlation 
coefficient  can  be  written  as 

n-1  n 

\m  2 l l «CXi,X.,Y.,Y.)/{n(n-l)},  (13) 

i-l  j-Ul  1 -1  i -1 

where  5(a,b,c,d)  ■ l if  (a-b)(c-d)  > 0,  and  » -1  if  (a-b)(c-d)  < 0.  When  (X.-X  )• 

1 ' y 

'‘Yi"Yj)  > 0 we  say  the  pairs  (X^.Y^ , (X^.Y^)  are  concordant  and  when  (X^X.)  (Y. -Y^) 

< 0 we  say  the  pairs  are  discordant.  Note  that  r^  is  a rank  statistic  (S(X. ,X. ,Y^ ,y.)  * 
4 (R^ .Rj .S^ ,S^1  so  that  one  only  needs  the  ranks  to  compute  r^t  but  it  is  not  a linear 

1 
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rank  statistic.  However,  it  can  be  shown  (cf.Hajek  and  Sidak,  1967,  Section  II. 5.1) 
that,  up  to  a multiplicative  constant.  Spearman's  r^  is  the  "projection"  of  Kendall's 
Tj.  into  the  family  of  linear  rank  statistics.  The  statistic  r^  estimates  the  parameter 
t * 2P(  (X^-X,)  (Y^-Y,)  > OM.  It  can  be  shown  that  for  all  samples  -1  s r^  s 1,  and 

for  all  bivariate  populations  -1st  si.  When  is  true,  r ■ 0.  In  addition, 
t > 0 may  be  interpreted  as  X and  V being  positively  associated  (as  measured  by  t), 
and  t < 0 may  be  interpreted  as  X and  Y being  negatively  associated  (as  measured  by  t). 
The  reader  should  note  that  t is  analogous  to  the  parameter  c^ilO)  estimated  by 
Spearman's  r^.  For  details  of  the  relationship  between  and  t,  see  Kruskal  (1953). 

From  (15)  we  see  that  tests  based  on  r^.  are  equivalent  to  tests  based  on 

K - Y I ?(X  ,X  Y Y ).  (14) 

i-lj-i-1  1 J 1 -1 


Under  assumptions  A1  and  A*,  an  exact  a level  test  of  H^  versus  t M is 

reject  HQ  in  favor  of  t M if  ! K | c k(x/2,n), 
accept  if  ' K | < k(a/2,n), 


(15) 


where  k(a/2,n)  is  the  upper  a/2  percentile  point  of  the  null  distribution  of  K. 

To  test  vs.  r > 0,  at  the  a level,  reject  if  K 2 k(a,n)  and  accept  otherwise. 

To  test  vs.  t < 0,  at  the  a level,  reject  if  K s -k(a,n)  and  accept  otherwise. 
Kaarsemaker  and  van  Wi jngaarden (1953)  give  tables  of  the  null  distribution  of  K for  n» 
4(1)40.  See  also  Table  A. 21  of  Hollander  and  Wolfe  (1973).  Extended  tables  up  to 
n « 100  are  made  available  on  request  by  D.J.  Best,  see  Best  (19"3). 

Under  HQ,  E(K)  ■ 0,  Var(K)  « n(n-l) (2n+5)/18,  and  as  n gets  large,  the  standardiced 
distribution  of  K tends  to  the  standard  normal  distribution.  Thus  approximate  (for 
large n)  tests  of  can  be  obtained  by  treating 
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K*  - K/(n(n-l)(2n*5)/13]*  (16) 

as  a standard  normal  variable  under  H^. 

Tiaa:  Although  assumption  A2  precludes  the  possibility  of  ties,  ties  nay 
occur  in  practice.  One  method  of  treating  ties,  when  dealing  with  rank  statistics, 
is  to  replace  R^  by  R*  (the  average  of  the  ranks  that  is  tied  for) , by  S* 

(the  average  of  the  ranks  that  is  tied  for),  compute  the  rank  statistic  using 
the  R*'s  and  S*'s,  and  refer  it  to  the  appropriate  null  distribution  tables  de- 
rived under  the  assumption  of  continuity.  This,  however,  yields  only  an  approximate, 
rather  than  an  exact  test. 

Exact  conditional  tests,  in  the  presence  of  ties,  can  be  performed  but  they 
are  computationally  tedious.  See,  for  example,  Lehmann  (19~5,  Section  7.3).  For 
more  information  on  ties,  see  Haiek  (1969,  Chapter  VII). 

Advontogaa  of  zvznk  t aata:  Advantages  of  rank  tests,  as  compared  to  the  param- 
etric test  based  on  r,  include: 

1.  Wider  applicability  - To  compute  a rank  statistic,  we  need  onlv  know  the 
ranks,  rather  than  the  actual  observations. 

2.  Outlier  insensitivity  - Rank  statistics  are  less  sensitive  than  r to  wildly 
outlying  observations. 

3.  Exactness  - Tests  based  on  rank  statistics  are  exact  under  the  mild  assump- 
tions A1  and  A2,  whereas  the  significance  test  based  on  r is  exact  only  when  H(x,v) 
is  bivariate  normal. 

4.  Good  efficiency  properties  - Rank  tests  of  HQ  are  only  slightly  less  effi- 
cient than  the  normal  theory  test  based  on  r under  the  underlying  bivariate  popula- 
tion is  normal  (the  home  court  of  r) , and  they  can  be  mildly  and  wildly  more  efficient 
than  r when  the  underlying  bivariate  population  is  not  normal.  Of  course,  the  effi- 
ciency question  is  complicated  as  it  depends  both  on  the  specific  rank  test  under 


I 


consideration  and  the  specific  measure  of  efficiency  used.  Roughly  speaking,  for 

large  n and  dependency  alternatives  "close"  to  the  null  hypo  thesis,  the  tests  based  on 

rs  and  r^  sacrifice  nine  percent  of  the  information  in  the  sample,  as  compared  to 

the  test  based  on  r,  under  the  underlying  population  in  bivariate  normal,  and  can 

be  much  more  efficient  for  certain  non-normal  populations.  For  more  details  on 

efficiency  and  power,  see  Lehmann  (1975),  Section  ~.5E)  and  Hajek  and  Sidak  (19b7, 

Section  VII.  2.4),  and  the  references  therein. 

Other  rcxnk  tests:  A "normal  scores"  rank  test  studied  by  Fieller  and  Pearson 

(1961)  and  Bhuchongkul  (1964)  is  particularly  noteworthy.  The  normal  scores  test 

statistic  for  independence  is  a linear  rank  statistic  of  the  form  (6)  with  a(i)  =* 

b(i)  * EV  ^ where  V ^ < ...  < V ^ is  an  ordered  sample  of  n observations 
n n n r 

from  the  standard  normal  distribution.  For  a suitable  choice  of  the  definition 
of  efficiency  and  a suitable  choice  of  the  nature  of  dependency  alternatives,  the 
normal  scores  test  of  independence  and  the  test  based  on  r are  equally  efficient 
under  "normality"  and  Srivastava  (1973)  has  shown  that  the  normal  scores  test  is  more 
efficient  than  the  test  based  on  r for  "all"  (i.e.  subject  to  mild  regularity) 
other  cases. 

References  to  other  nonparametric  tests  of  independence  can  be  found  in  Sections 
3.1  and  10.2  of  Hollander  and  Wolfe  (1973)  and  in  Section  '.SD  of  Lehmann  (1975). 

3.  Example 

The  following  example  is  based  on  data  of  Rasekh,  Kramer,  and  Finch  (1970)  in 

a study  designed  to  ascertain  the  relative  importance  of  the  various  factors  contri- 

• • • 

buting  to  tuna  quality  and  to  find  objective  methods  for  determining  quality  parameters 
and  consumer  preference.  Table  1 gives  values  of  the  Hunter  L measure  of  lightness, 
along  with  panel  scores  for  nine  lots  of  canned  tuna.  The  original  consumer  panel 
scores  of  excellent,  very  good,  good,  fair,  poor,  and  unacceptable  were  converted 
to  the  numerical  values  of  6, 5, 4, 3, 2,  and  1,  respectively.  The  panel  scores  in  Table  1 


!- 


v HH 


are  averages  of  80  such  values.  The  Y random  variable  is  thus  discrete,  and  hence 
the  continuity  portion  of  assumption  A2  is  not  satisfied.  Nevertheless,  since  each 
Y is  an  average  of  80  values,  we  need  not  be  too  nervous  about  this  departure  from 
assumption  A 2. 

It  is  suspected  that  the  Hunter  L value  is  positively  associated  with  the 

panel  score.  Thus  we  will  illustrate  the  one-sided  tests  of  HQ  versus  positive 

association,  based  on  r,  r_,  and  r,  . The  reader  will  soon  see  that  all  three  tests 

i>  K 

reach  the  same  conclusion,  i.e.,  there  is  positive  association  between  the  Hunter 
l value  and  the  panel  score. 

Table  1.  Hunter  L values  and  consumer  panel  scores 
for  nine  lots  of  canned  tuna. 


Lot 

Hunter  L Value  (X) 

Panel  Score  (Y) 

1 

44.4 

2.6 

1 

45.9 

3.1 

3 

41.9 

2.5 

4 

53.3 

5.0 

5 

44.7 

3.  6 

6 

44.1 

4.0 

7 

50.7 

5.2 

8 

4S.2 

2.8 

9 

60.1 

3.8 

Source.  J.  Rasekh,  A.  Kramer,  and  R.  Finch  (1970). 


Tea t baaed  on.  r:  From  Table  1,  we  easilv  calculate  Ex.Y.  * 1584.88, 

li  ' 

EX.  • 430.3,  EY.  • 32.6,  (EX^2  « 185158.09,  (EY^2  « 1062.76,  EX2  * 20843.11, 

•S 

EY*  ■ 125. 90,  and  from  (2)  and  (5)  with  n ■ 9,  we  obtain  r ■ .57  and  T * 1.84. 
Referring  T ■ 1.84  to  a t-distribution  with  7 degrees  of  freedom  yields  a one- 
sided P value  of  .0S4.  Thus  the  test  based  on  r leads  to  the  conclusion  that 
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the  Hunter  L lightness  variable  and  the  panel  score  variable  are  positively 
associated. 


The  large  sample  approximation  refers  T « 1.84  to  the  standard  normal  dis- 
tribution yielding  an  approximate  P value  of  .034. 


Test 

based  on 

We  use  Table  2 to 

illustrate 

the  computation  of  r$. 

Table 

2 % 

Computation  of  rs 

for  the  canned  tuna  data 

Lot 

R 

S 

D 

7 

0" 

1 

3 

2 

1 

1 

■y 

6 

4 

2 

4 

3 

1 

1 

0 

0 

4 

8 

8 

0 

0 

5 

4 

5 

-1 

1 

6 

2 

7 

-5 

25 

7 

7 

9 

-2 

4 

8 

5 

3 

2 

4 

9 

9 

6 

3 

9 

ID2=48 

From  (9)  with  n * 9 we  obtain 


r * 1 - -MU * .60. 

3 (9)3-9 

Referring  rg  * .60  to  Table  J of  Kraft  and  van  Eeden  (1968)  yields  a one-sided  P 
value  of  .048.  Thus  the  test  based  on  rs  leads  to  the  conclusion  that  the  Hunter  L 
lightness  variable  and  the  panel  score  variable  are  positively  associated. 

From  (12)  we  see  that  the  large  sample  approximation  refers  rs*  ■ (8)*(.6)  * 
1.70  to  the  standard  normal  distribution  yielding  an  approximate  P value  of  .045. 
This  is  in  good  agreement  with  the  exact  P value  of  .048  based  on  rg. 


r 


k* 


."-st  bat'd  on  iy  Table  3 contains  the  , Xj , Yj . V ) values  used  to  compute 
For  example,  the  i • 2,  j • 5 entry  in  Table  3 is  a *M”  because  X,  > X, 

m •) 


Table  3.  £ (X^ ,Xj ,Y^  .Y^ ) values  for  canned  tuna  data 


and  Y,  < Ys.  yielding  (X2  - Xg) (Y,  - Yg)  < 0 and  thus  S(X2,X51Y,,Y5)  - -l.  Summing 
the  l's  and  -i's  of  Table  3 yields  K » 16  and  from  (13),  • .44.  Referring 

K • 16  to  Table  A. 21  of  Hollander  and  Wolfe  (1973)  yields  a one-sided  P value  of 
.060.  Thus  there  is  evidence  that  the  Hunter  L lightness  variable  and  the  panel 
score  variable  are  positively  associated. 

To  apply  the  large  sample  approximation  we  compute,  from  (16),  K*  « 1.67 
yielding  an  approximate  P value  of  .048.  This  is  in  good  agreement  with  the  exact 
P value  of  .060  based  on  K. 
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